A Flexible Approach for Designing Optimal Reward Functions

نویسندگان

  • Ricardo Grunitzki
  • Bruno Castro da Silva
  • Ana L. C. Bazzan
چکیده

Defining a reward function that, when optimized, results in the rapid acquisition of an optimal policy, is one of the most challenging tasks involved in applying reinforcement learning algorithms. The behavior learned by agents is directly related to the reward function they are using. Existing works on Optimal Reward Problem (ORP) propose mechanisms to design reward functions that facilitate fast learning, but their application is limited to specific sub-classes of single or multi-agent reinforcement learning problems. Moreover, while these methods identify which rewards should be given in which situation, they do not give clues regarding on which features of the state or environment should be used when defining a reward function. In this paper, we address these and other issues of ORP. Experimental results on a gridworld scenario are used to evaluate the efficacy of our approach in designing effective reward functions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reward Design via Online Gradient Ascent

Recent work has demonstrated that when artificial agents are limited in their ability to achieve their goals, the agent designer can benefit by making the agent’s goals different from the designer’s. This gives rise to the optimization problem of designing the artificial agent’s goals—in the RL framework, designing the agent’s reward function. Existing attempts at solving this optimal reward pr...

متن کامل

Optimal Load of Flexible Joint Mobile Robots Stability Approach

Optimal load of mobile robots, while carrying a load with predefined motion precision is an important consideration regarding their applications. In this paper a general formulation for finding maximum load carrying capacity of flexible joint mobile manipulators is presented. Meanwhile, overturning stability of the system and precision of the motion on the given end-effector trajectory are take...

متن کامل

Optimal flexible capacity in newsboy problem under stochastic demand and lead-time

In this paper, we consider a newsvendor who is going to invest on dedicated or flexible capacity, our goal is to find the optimal investment policy to maximize total profit while the newsvendor faces uncertainty in lead time and demand simultaneously. As highlighted in literature, demand is stochastic, while lead time is constant. However, in reality lead time uncertainty decreases newsvendor's...

متن کامل

COVARIANCE MATRIX OF MULTIVARIATE REWARD PROCESSES WITH NONLINEAR REWARD FUNCTIONS

Multivariate reward processes with reward functions of constant rates, defined on a semi-Markov process, first were studied by Masuda and Sumita, 1991. Reward processes with nonlinear reward functions were introduced in Soltani, 1996. In this work we study a multivariate process , , where are reward processes with nonlinear reward functions respectively. The Laplace transform of the covar...

متن کامل

Function Approximation Approach for Robust Adaptive Control of Flexible joint Robots

This paper is concerned with the problem of designing a robust adaptive controller for flexible joint robots (FJR). Under the assumption of weak joint elasticity, FJR is firstly modeled and converted into singular perturbation form. The control law consists of a FAT-based adaptive control strategy and a simple correction term. The first term of the controller is used to stability of the slow dy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017